This paper presents a method for constructing human-robot interactionpolicies in settings where multimodality, i.e., the possibility of multiplehighly distinct futures, plays a critical role in decision making. We aremotivated in this work by the example of traffic weaving, e.g., at highwayon-ramps/off-ramps, where entering and exiting cars must swap lanes in a shortdistance---a challenging negotiation even for experienced drivers due to theinherent multimodal uncertainty of who will pass whom. Our approach is to learnmultimodal probability distributions over future human actions from a datasetof human-human exemplars and perform real-time robot policy construction in theresulting environment model through massively parallel sampling of humanresponses to candidate robot action sequences. Direct learning of thesedistributions is made possible by recent advances in the theory of conditionalvariational autoencoders (CVAEs), whereby we learn action distributionssimultaneously conditioned on the present interaction history, as well ascandidate future robot actions in order to take into account response dynamics.We demonstrate the efficacy of this approach with a human-in-the-loopsimulation of a traffic weaving scenario.
展开▼